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Abstract— One of the main heritage tools used in scientific 
and engineering data spectrum analysis is the Fourier 
Integral Transform and its high performance digital 
equivalent - the Fast Fourier Transform (FFT). Ihe FFT is 
particularly useful in two-dimensional (2-D) image 
processing (FFT2) within optical systems control. However, 
timing constraints of a fast optics closed control loop would 
require a supercomputer to run the software implementation 
of the FFT2 and its inverse, as well as other image 
processing representative algorithms, such as numerical 
image folding and fringe feature extraction. A laboratory 
supercomputer is not always available even for ground 
operations and is not feasible for a flight project. However, 
the computationally intensive algorithms still warrant 
alternative implementation using reconfigurable computing 
technologies (RC) such as Digital Signal Processors (DSP) 
and Field Programmable Gate Arrays (FPGA), which 
provide low' cost compact super-computing capabilities. We 
present a new' RC hardware implementation and utilization 
architecture that significantly reduces the computational 
complexity of a few' basic image-processing algorithms, such 
as FFT2, image folding and phase diversity for the NASA 
Solar Viewing Interferometer Prototype (S\TP) using a 
cluster of DSPs and FPGAs. The DSP cluster utilization 
architecture also assures avoidance of a single point of 
failure, while using commercially available hardware. This, 
combined with the control algorithms pre-hardware 
optimization, for the first time allou's construction of image- 
based 800 Hertz (Hz) optics closed control loops on-board a 
spacecraft, based on the S\TP ground instrument. That 
spacecraft is the proposed Earth Atmosphere Solar- 
Occultation Imager (EASI) to study greenhouse gases C0 2 , 
C 2 H, H 2 0, 0 3 , Oj, N 2 0 from Lagrange-2 point in space. 
This paper provides an advanced insight into a new type of 
science capabilities for future space exploration missions 
based on on-board image processing for control and for 
robotics missions using vision sensors. It presents a top-level 
description of technologies required for the design and 
construction of SVTP and EASI and to advance the spatial- 
spectral imaging and large-scale space interferometry' 
science and engineering. 
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Introduction 

Reconfigurable Computing technologies, such as Digital 
Signal Processors and Field Programmable Arrays, 
outperform general-purpose computer platforms by at least 
an order of magnitude ([2], [3]) for certain applications. 
This, in particular for the DSP, is because of the DSP’s 
architecture difference from a general-purpose processor: 

- DSP Harvard architecture is different from Von 
Newman’s machine architecture by allowing hardw'are 
parallelism in instruction and data handling on the same 
processor 

- DSP performs integer or floating-point operations of 
addition, multiplication and a few' other basic arithmetic 
operations (that comprise FFT) in one system clock cycle 

- DSP extensive parallelism in algorithm implementation 
also stems from the capability to configure DSPs into a 
cluster reducing the FFT2 to multiple 1-D FFTs. 

- DSP allows uninterrupted execution of target code as 
opposed to general-purpose processors’ operating system 
task slice delay. This results in a DSP outperforming a 
Pentium, even for the operation of division in a long loop. 

- DSP allows faster propagation of an application design to 
flight system by augmenting the slow' state-of-the-art flight 
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general-purpose processors running commercial operating 
systems. 

1. Flight Mission 

The Earth Atmosphere Solar-Occultation Imager (EASI) to 
study greenhouse gasses from the Lagrange-2 point (L2) in 
space is the flight mission (Figure 1). It would provide a 
complete set of science data for the Earth atmosphere in 24 
hours. The Solar Viewing Interferometer Prototype (Figure 
5) is the instrument being developed for ground testing of 
the advanced concepts and technologies required for the 
EASI mission. It is the S\TP that is primarily described in 
this paper. The RC technologies required for the SVTP fast 
optics control loops are determined by the SVTP science 
requirements. 



The Earth atmosphere limb is perpetually illuminated by the 
Sun and scanned by EASI. The rectangular scan area is 
shown at the bottom of Figure 1 . The scan area is imaged 
onto the SVIP Charge Coupled Device (CCD) sensors. 

2. Theoretical Requirements of the Software 

ALGORITHMS 

The SVIP instrument comprises a few subsystems among 
which is the Fast Optics Control Subsystem (FOCS). Other 
SVTP subsystems are similar to heritage instruments. 
However, the FOCS is new and it is defined by the critical 
science requirements for the control algorithms. This 
subsystem consists of components that require on-board 
high-speed control, based on high frequency waveform 
image sensing and image data processing, and which enables 
the stabilization of images. The science requirements for 
SVIP were established by the Principal Investigators (PI) 
Dr. Jay R. Herman and Richard G. Lyon (Tables 1, 2). 


Data Source 1 


Differential Piston Sensors (DP 2): 


Number of CCDs 

2 

CCD Frame size in pixels 

1024x128 

CCD Pixel size 12 bits < 

2 bytes 

CCD Readout rate 

800 Hz 


Table 1. Piston Science Requirements 


Data Source 2 


Tip-Tilt Fringe Sensors (TT 3): 


Number of CCDs 

3 

CCD Frame size in pixels 

256x256 

CCD Pixel size 12-bits < 

2 bytes 

CCD Readout rate 

800 Hz 


Table 2. Tip-Tilt Science Requirements 


There are two types of digital image data sources wi thin the 
SVIP — the piston CCD and tip-tilt CCD sensors. The data 
throughput volumes and rates, and the computational 
complexity requirements derived from the above science 
requirements are described in Tables 3, 4. The image 
sensors that control the fast optics control piston delay lines 
and tip-tilt orientation are the two piston CCDs and three 
rip-tilt CCDs. The image sensors’ sizes in pixel rows and 
columns and readout rates were determined in extensive 
simulations of the control algorithms to account for 
atmospheric turbulence. 


Differential Piston Sensor CCD Output Data 
Volume (DPCVi, i=l-2) in bytes is comprised by 
the readout from 2 CCDs at a rate of 800 Hz: 

Raw Frame size (pixels) = 1024 x 12S = 131,072 
Raw Frame size (bytes) = 131072 x 2 = 262,144 
DPCVI =262,144x800 -210 MBs 
DPCV2 = -210 MBs 

DPCV Total Output of Raw Data = 0.42 GBs, 
where MBs is for Million Bytes per second and 
GB is for Billion or Giga Bytes per second 


Table 3. Piston Derived Requirements 


Tip/Tilt Sensor CCD Output Data Volume (TTCVi, 
i=l-3) in bytes is comprised by the readout from 3 
CCDs at 800 Hz: 

Raw Frame size in pixels = 256 x 256 = 65,536 
Raw Frame size in bytes = 65536 x 2 = 0. 14 MB 
TTCVI = 131,072 x*800 <= 104,857,600 <= 105 MBs 
TTCV2 = 105 MBs 

TTCV3 = 105 MBs 

TTCV Minimal Total Output of Raw' Data - 0.32 GB 


Table 4. Tip-Tilt Derived Requirements 

1.1 Fast Optics Control System Computational 
Complexity 

The FOCS system complexity 7 comprises that of the FOCS 
on-board data volumes, data rates and computational 
complexities. These require sensors and computational 
technologies that are beyond the state of the art of heritage 
spacecraft technology 7 . However, the task can be presently 
accomplished with emerging and commercially available 
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reconfigurable computing technologies. Given the FOCS 
data volumes and real time processing requirement of 800 
Hz or 1.25 milliseconds (ms), the choice of reconfigurable 
computing and clusters of Digital Signal Processors as 
cardinal elements of FOCS becomes obvious. A DSP-based 
architecture, comprised of a few DSP board dusters with 
imbedded FPGA processing elements (PEs), is available in a 
Compact Peripheral Component Interface (cPCI) ground 
and flight configurations (Figure 6). The DSPs are also 
readily available for floating-point or fixed-point arithmetic 
operations. 

1.2 SVTP FOCS parametric characteristics and processes 

The parametric characteristics and processes of SVTP FOCS 
can be summarized as follows: 

- Ingest on-board a total of 0.74 GBs of image digital data 
from 5 CCD sensors 

- Reformat the digital data magnitudes from CCDs into 1.5 
GBs of single precision floating-point or integer size 
numbers and store the data in DSP board external memory 

- Process reformatted 1.5 GBs digital data using floating- 
point or integer arithmetic in 1.5 milliseconds. The 
Processing algorithms’ complexity' is similar to a 1024 x 
1024 array of floating-point numbers' FFT2 at 800 Hz 

- Synchronize all on-board I/O operations and computations 

- Construct fast optics control loop digital commands in real 
time at 800 Hz. 

A CCD is located in a focal plane (FP) with the CCD 
detector area being a 2-D surface parallel to the focal plane. 
It is the strong assumption of the control algorithms' theory 
that all CCDs are located in a single focal plane or that the 
CCD sensors are located in a few' focal planes hut their 
detector planes are of known angle to a single vector 
direction in 3-D space in some chosen coordinate system 
attached to the instrument. 

The theoretical and engineering base for the SVTP FOCS 
mirrors’ motion algorithms is also assuming the availability' 
of reconfigurable super-computing resources such as DSPs 
and FPGAs. 

Digital Signal Processor hardw'are implementation of image- 
based W'avefront sensing and digital image data processing 
for fast optics control requires high data rate 
communications between the image sensor’s electronics and 
DSP hardware. The 2-D image sensor, like a charge coupled 
device based image sensor, detects the light w'avefront and 
the detected signal is then conditioned, digitized and 
transmitted by the image sensor’s electronics to the external 
memory of a DSP board by the way of an intermediate 
FPGA formatter. The computationally intensive image 
processing algorithms compute w'avefront phase error or 
interferometry' functions. The resulting information is used 
to correct the image and generate digital commands in a 
feedback loop to the fast optics control unit. The digital 


images are processed using the novel Phase Diversity’ and 
Misell algorithms to recover image wavefront phase error. 
These images are also processed using Fizeau 
interferometry-based algorithms for motion control. Richard 
G. Lyon has developed the software algorithms for SVTP. 
How'ever, these algorithms are naturally complex, requiring 
on-board supercomputing capabilities and high sampling 
rates at the optics’ focal plane CCD. 

Recently these algorithms were significantly refined and 
accelerated at the NASA Goddard Space Flight Center 
(GSFC) and represent a breakthrough in signal processing 
technology'. They enable a new class of science and 
engineering exploration themes in using the novel Fizeau 
interferometry in space [1], The remaining bottleneck is the 
rate at which image data is transmitted from the image 
sensor electronics to the DSP board memory'. When a CCD 
is used, a high-speed Camera Link Interface from CCD 
electronics to a DSP board is needed, in order to enable this 
novel image sensing and signal processing technology. 
There are no such interfaces on the market today for the data 
rates required by SVTP and this represents the challenge of 
developing an inexpensive CCD/DSP interface. 

The other theoretical complexity is the small size of the 
optical elements and associated error budget derivation. 

1.2 Fast Optics Control System Timing Schematics 

The timing schematics in the following Figure 2 is in a 
stylized fashion and is presented just to depict the tight 
timing constraints within the FOCS. It describes a 1.25 time 
slice of operations by analyzing its constituent components. 

[bj i i i i i i i.f.p p p p p p p p p.d] [bj 


0000000001 1 1 1 1 1 1 1 1 1 
1234567890123456 78 9 

Figure 2. Timing Events for a Unit of Work 

The 1.25 ms interval [t 1; t 19 ] comprises 19 tics. Each tic is 
0.0625 ms in duration and functional tics occur at times 
to=t=0.0 ms, t9=0.5 ms, ti 0 =O.5625 ms, t I8 =1.0625 ms and 
t !9 =1.125 ms. The unit of a work interval of 1.125 ms is so 
short, and activities so fast, that only a few' basic events take 
place on hardw’are level, such as (more detailed timing will 
evolve depending on scientific data processing algorithms): 

- Generate Pulse beginning edge (bj) at 800 MHz (or each 
1.25ms) 

- Ingest (i) all sensor data beginning at the 800 Hz control 
pulse rising edge for almost half the time or 0.5 ms 

- Convert pixel data into computational format values (f) in 
0.0625 ms 



1 Channel 


800Hz digital 
command 


Figure 3. FOCS Top Level Design 


- Process (p) images (all data and all algorithms) for 0.5 ms 

- Generate and issue a digital command (d) in 0.0625 ms. 


1.3 Fast Optics Control Loop Top Level Design 


The fast optics control system is essentially electronics 
“box” of high performance RC hardware. It can be 
visualized as a stand-alone cPCI chassis (Figures 4, 6) that 
houses half a dozen of cPCI DSP boards. These are the five 
DSP boards, each processing data from a single CCD, one 
or two DSP boards for high intensive computations required 
by the control algorithms, and a single board computer 
(SBC) which serves a the host processor. The CCD sensor 
electronics usually comes with the capability' to read out an 
image over a few channels. For example, the MC13 CCD 
(Trademark of Microtron) has 10 channels, each 10 bits 
wide. The sensor can be configured to clock out a sub-frame 
on any number of channels. We find it sufficient (from S\TP 
performance point of view) to use 2 channels for the larger 
sub-frame of the piston CCDs and 1 channel for the tip-tilt 
sensor CCDs smaller sub-frame. It is necessary to dedicate 
one DSP board to each CCD because of high data rates 
required by SVIP and for consistency of operations across 
the entire control system. The SBC is used as the host to 
configure and synchronize the DSPs as well as the ground 
system computer during SVIP integration and testing. The 
following Figure 3 is the schematics diagram of the fast 
optics control system and Figure 4 is its implementation 
prototype. 


2 Channels 


2 Channels 


Fast Optics Sub-Subsystem 
Sensors: 

DP 

DP CCD-1 210 MBs 
DP CCD-2 210 MBs 


TT 

TT CCD-3 105 MBs 
TT CCD-4 105 MBs 


TT CCD-5 105 MBs 


All CCDs Readout Sync Pulse at 800 Hz 


Fast Optics Subsystem Control and 
Actuators: 

2 DP 1-D Translation actuators 

3 TT 2-D Tip/Tilt actuators 


1 Channel 


1 Channel 


Figure 4. DSP chassis 


All actuators analog control at 200Hz 


SVIP Fast Optics Control Sub- 
System: 

6U cPCI chassis with 5 DSP 
Boards 


- One Channel from CCD 
detector electronics to one DSP 
Board’s external memory IO 

- Sufficient Number of DSPs 
for Tip-Tilt and Piston 
Algorithms Computations. 

- Pentium III computer Single 
Board Computer for host 
operations on the cPCI chassis 


Fast Actuators’ intelligent control 
Algorithms running on a dedicated 
DSP board 





2.0 Prototype Instrument 


The S\TP instrument (Figure 5) comprises three 10 cm 
telescopes and internal optics that form images on 5 CCD 
control sensors. The instrument telescopes point to a sunspot 
and the control system’s goal is to stabilize the image for a 
few seconds before the target image is re-acquired again on 
Channel 1 telescope. A spectrometer camera captures the 
science data. 


Channel 


Channel 


Spectro 

meter 


Detector 

Bench 


Channel 



Figure 5. SVTP Laboratory Configuration 
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3.0 DSP Cluster Architecture 

The cardinal issue m using reconfigurable computing 
resources, such as DSP and FPGA technology, in addition to 
the fact that they provide inexpensive super-computing 
perfoimance, is the availability of DSP/DSP and DSP 
hoard'DSP board interfaces that are faster than the cPCI bus. 
This allows multiple DSPs and DSP boards clusters 
interconnect (Figure 5). The availability (or rather absence) 
of fast image sensor/DSP interfaces, which satisfy an 
industry standard communications protocol such as Camera 
Link, at data transfer rates determined by the S\TP science 
requirements still represents a challenge. Some vendors, 
however, have enough resources on their DSP boards to 
facilitate a fast and inexpensive CCD/DSP Camera Link 
protocol interface. 


S\5tanCteariptian 

ITPCkser 


ESPCXscr 



Figure 6. DSP Clusters Architecture 

The shown configuration w 7 as developed by NASA and DSP 
board vendor as part of a business proposal and presented 
here with vendor’s permission. 

4.0 Algorithm Implementation for a General- 
Purpose Processor 

There are two types of control algorithms, the piston and the 
tip-tilt control algorithms. Both algorithms are implemented 
as standalone C-Language programs for a general-purpose 
personal computer (PC) and compiled by the Visual C++ 
compiler (Trademark of Microsoft Corporation). Each of the 
two is a small program which can be visually examined and 
analyzed and can be easily timed to determine its 


computational performance bottlenecks. These bottlenecks 
are further implemented, again in C-Language but compiled 
with the RC hardware (DSP) compiler (which is provided by 
the DSP vendors - Texas Instrument or Analog Devices 
Corporation) for loading onto DSP hardware as described 
below-. 

5.0 Algorithms pre-hardware optimization and 
Hardw are Implementation For a Target DSP 

The general-purpose processor algorithms implementation 
bottlenecks were re-implemented to run in hardware on 
clusters of DSPs in two versions - floating point and fixed- 
point modes. The bottleneck source code was first subjected 
to pre-hardware optimization. This included sometimes an 
intentional change to the algorithm that may somewhat 
reduce its computational performance on the PC platform. 
However, it made the code more susceptible to parallel 
implementation in hardware and overall performance 
improvement by an order of magnitude. We will further 
elaborate on the FFT and shuffle algorithms pre-hardw'are 
optimization and hardware implementation in the following 
sections. 

5.1 Fast Fourier Transform Algorithm 

The Fast Fourier Transform algorithm implementation for 
one dimension (1-D) used by MATLAB (Trademark of 
MathWorks Inc.) is called FFTW - for “Fastest Fourier 
Transform in the West” (Trademark of Massachusetts 
Institute of Technolog} 7 ). How r ever, the size of its executable 
is 600 KB and requires a license. It is not feasible to fly such 
large software or firmware modules. Whilst, there is a need 
to find a smaller (20KB) module, that may be slower than 
FFTW but which is more susceptible to hardware 
implementation on a DSP. For this purpose w r e replaced the 
general platform’s FFTW by a smaller FFT module similar 
to that used on most DSPs and which can be found in the 
DSP Optimization Libraries. The DSP Libraries only have 
an FFT for a vector input and we implemented a fast 2-D 
FFT or FFT2 by using the 1-D FFT. 

5.2 Shuffle Algorithm 

Usually shuffle or folding is performed for the entire image 
after an FFT2 operation on the image is completed. If the 
shuffle is not totally random, then determining first the 
shuffle result (by using a small stand-alone program), and 
then replacing the shuffling completely by hard coding the 
shuffle process’ results can achieve a better computational 
efficiency of the shuffle. For this project, after horizontal 
and vertical shuffle only a small region in the image 
geometrical center (15x15 pixels) is used to derive the 
digital commands. We have stored in the image array its 
coordinates and performed the shuffle. This resulted in a 
15x15 central array shuffle result that can be used to replace 
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the entire shuffle algorithm by direct initialization of the 
small array. 

5.3 Trigonometric Functions and Radicals 

The volume of computations in an algorithm involving 
trigonometric functions and square root evaluations, as well 
as the number of divisions, is very' important because the 
reconfigurable hardware does not have a single cycle 
implementation for the operation of division. The 
computational complexity of the Misell algorithm, for 
example, is due to the fact that this volume of operations is 
51%. As described in the Introduction, such code after pre- 
hardware optimization may also be implemented in its 
entirety on a DSP. The pre-hardware optimization may 
include a better than Taylor series approximation of the 
trigonometric functions proposed by Dr. Jay Herman. 

6.0 Representativ e Runs and Results 

We were able to run the algorithms in real time, control the 
fast optics mirrors and track the target to our satisfaction. 

Conclusions 

We have presented a first of its kind fast optics control 
system top-level design based on rapid image sensing on- 
board the SVIP instrument with input-output and 
computational architectures, as well as implementation road 
map for such a system using reconfigurable computing 
technologies. We are presently in the process of building 
this instrument and developing a proposal for a future space 
mission. We hope that this work will advance the spatial- 
spectral imaging and large-scale space interferometry 
science and engineering. 
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